
\documentclass[11pt]{article} % use larger type; default would be 10pt
\usepackage{framed}
\usepackage[utf8]{inputenc} % set input encoding (not needed with XeLaTeX)
\usepackage{geometry} % to change the page dimensions
\geometry{a4paper} % or letterpaper (US) or a5paper or....
% \geometry{margin=2in} % for example, change the margins to 2 inches all round
% \geometry{landscape} % set up the page for landscape
%   read geometry.pdf for detailed page layout information

\usepackage{graphicx} % support the \includegraphics command and options

% \usepackage[parfill]{parskip} % Activate to begin paragraphs with an empty line rather than an indent

%%% PACKAGES
\usepackage{booktabs} % for much better looking tables
\usepackage{array} % for better arrays (eg matrices) in maths
\usepackage{paralist} % very flexible & customisable lists (eg. enumerate/itemize, etc.)
\usepackage{verbatim} % adds environment for commenting out blocks of text & for better verbatim
\usepackage{subfig} % make it possible to include more than one captioned figure/table in a single float
% These packages are all incorporated in the memoir class to one degree or another...
\usepackage{framed}

%%% HEADERS & FOOTERS
\usepackage{fancyhdr} % This should be set AFTER setting up the page geometry
\pagestyle{fancy} % options: empty , plain , fancy
\renewcommand{\headrulewidth}{0pt} % customise the layout...
\lhead{}\chead{}\rhead{}
\lfoot{}\cfoot{\thepage}\rfoot{}

%%% SECTION TITLE APPEARANCE
\usepackage{sectsty}
\allsectionsfont{\sffamily\mdseries\upshape} % (See the fntguide.pdf for font help)
% (This matches ConTeXt defaults)

%%% ToC (table of contents) APPEARANCE
\usepackage[nottoc,notlof,notlot]{tocbibind} % Put the bibliography in the ToC
\usepackage[titles,subfigure]{tocloft} % Alter the style of the Table of Contents
\renewcommand{\cftsecfont}{\rmfamily\mdseries\upshape}
\renewcommand{\cftsecpagefont}{\rmfamily\mdseries\upshape} % No bold!
\begin{document}
	

%=================================================================%
\newpage
\section{ML Week 8 : Machine Learning Clustering}

Suppose you have an unlabelled set of observations$\{ x^{(1)},x^{(2)},x^{(3)},x^{(4)}, \ldots ,x^{(n)}\}$.
You run $k-$means with 50 random initializations and obtain 50 different clusterings solution of the data.
What is the recommended way of choosing which solution to use.

\subsection{The Distortion Function}
For each of the clustering solutions, the distortion function is computed as 
\[ \frac{1}{m} \sum^{m}_{i=1} \parallel x^{(i)} - \mu_c^{(i)}\parallel ^2 \]
The optimal clustering solution is the solution that minizes the distortion function.
Since a lower value for a distortion function, implies a better clustering, you should choose the clustering with 
smallest value of the distortion function.
%----------------------------------------------------------------------------------%
\subsection*{Finding Closest Centroids}

In the cluste assignment phase of the algorithm, the algorithm assigns every training example $x^{(n)}$ to the closest 
centroid, given the current position of the centroids.

\[ C^{(i)} := J \mbox{ that minimizes } \| x^{} - \mu_j \| ^2 \]

\begin{itemize}
	\item $C^{(i)}$ index of the centroid clostest to $x^{(i)}$
	\item $\mu_j$ is the position of the $j-$th centroid.
\end{itemize}

%---------------%
\subsection{Computing Centroid Means}

\[ 
\mu_k := \frac{1}{abs(C_k)} \sum_{i \in C_k} x^{(i)}
\]
%---------------%

A good way to initialize k-mean is to select $k$ distinct examples from the training set and set the cluster centroids equal to these selected examples.

On every iteration of $k-$means, the cost function $J(C^{(1)},C^{(2)},\ldots, C^{(n)},
\mu_1,\ldots \mu_k)$

The distortion function should either stay the same or decrease, in particular it should not increase.


\section{Week 7 Unsupervised Learning}
\subsection{K-means Clustering}


\begin{itemize}
	\item In this this exercise, you will implement the K-means algorithm and use it for image compression.
	
	\item You will first start on an example 2D dataset that will help you gain an intuition of
	how the K-means algorithm works. After that, you wil use the K-means algorithm for image
	compression by reducing the number of colors that occur in an image to only those that are most common in that image.
	
	\item You will be using ex7.m for this part of the exercise.
	
\end{itemize}

\subsection{Implementing K-means}
\begin{itemize}
	\item The K-means algorithm is a method to automatically cluster similar data examples together.
	
	\item The intuition behind K-means is an iterative procedure that starts by guessing the initial centroids, and then refines this guess by repeatedly assigning examples to their closest centroids and then recomputing the centroids based on the assignments.
	
	\item The inner-loop of the algorithm repeatedly carries out two steps: 
	\begin{itemize}
		\item[(i)] Assigning each training example to its closest centroid 
		\item[(ii)] Recomputing the mean of each centroid using the points assigned to it. 
	\end{itemize}
	
	
	\item The K-means algorithm will always converge to some final set of means for the centroids.
	Note that the converged solution may not always be ideal and depends on the initial setting of the centroids. 
	
	\item Therefore, in practice the K-means algorithm is usually run a few times with different random initializations. 
	
	\item One way to choose between these different solutions from different random initializations is to choose the one with the lowest cost function value (distortion).
	
	\item Random initialization
	The initial assignments of centroids for the example dataset in \texttt{ex7.m} were designed so that you will see the same gure as in Figure 1. 
	
	\item In practice, a good strategy for initializing the centroids is to select random examples from the training set.
\end{itemize}
%=================================================================%
\newpage
\subsection*{Question 1. }
For which of the following tasks might K-means clustering be a suitable algorithm? Select all that apply.

\begin{itemize}
	\item 
	\item
	\item 
	\item 
	
\end{itemize}
%=================================================================%
\subsection*{Question 2.} 
Suppose we have three cluster centroids $\mu_1$=$[1 2]^T$, $\mu_2$=$[-3 0]^T$ and $\mu_3$=$[4 2]^T$. 
Furthermore, we have a training example x(i)=$[3 1]^T$. After a cluster assignment step, what will $c^{(i)}$ be?

\begin{itemize}
	\item[(1)] $(3-1)^2 + (1-2)^2 $
	\item[(2)] $(3-1)^2 + (1-2)^2 $
	\item[(3)] $(3-1)^2 + (1-2)^2 $
	\item 
	
\end{itemize}

%=================================================================%
\subsection*{Question 3.}
K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in its inner-loop. Which two?
\begin{itemize}
	\item 
	\item
	\item 
	\item 
	
\end{itemize}

%=================================================================%
\subsection*{Question 4. }
Suppose you have an unlabeled dataset $\{x(1),\ldots,x(m)\}$. You run K-means with 50 different random
initializations, and obtain 50 different clusterings of the data. 
What is the recommended way for choosing which one of these 50 clusterings to use?

%=================================================================%
\subsection*{Question 5. }
Which of the following statements are true? Select all that apply.



%====================================================================%
\newpage

\section{Week 8 Unsupervised Learning}

%-----------------------------------------------%
\subsection{ Question 1. } 
For which of the following tasks might K-means clustering be a suitable algorithm? Select all that apply.

YES Given a database of information about your users, automatically group them into different market segments.

YES Given sales data from a large number of products in a supermarket, figure out which products tend to form coherent groups (say are frequently purchased together) and thus should be put on the same shelf.

Given historical weather records, predict the amount of rainfall tomorrow (this would be a real-valued output)

Given sales data from a large number of products in a supermarket, estimate future sales for each of these products.
%-----------------------------------------------%
\subsection{ Question 2. }
Suppose we have three cluster centroids $\mu_1$=[1 2], $\mu_2$=[-3 0] and $\mu_3$=[4 2]. 
Furthermore, we have a training example x(i)=[-2 1]. After a cluster assignment step, what will c(i) be?

YES c(i)=1

c(i)=2

c(i)=3

c(i) is not assigned


%-----------------------------------------------%
\subsection{ Question 3. }
K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in its inner-loop. Which two?

\begin{itemize}
	\item Move each cluster centroid $\mu_k$, by setting it to be equal to the closest training example x(i)
	
	\item YES Move the cluster centroids, where the centroids $\mu_k$ are updated.
	
	\item The cluster assignment step, where the parameters c(i) are updated.
	
	\item YES The cluster centroid assignment step, where each cluster centroid $\mu_i$ is assigned (by setting c(i)) to the closest training example x(i).
	\end{itemize}
	%-----------------------------------------------%
\subsection{ Question 4. }
	Suppose you have an unlabeled dataset {x(1),…,x(m)}. You run K-means with 50 different random
	
	initializations, and obtain 50 different clusterings of the
	
	data. What is the recommended way for choosing which one of
	
	these 50 clusterings to use?
	
	
	SELECTED Compute the distortion function J(c(1),…,c(m),$\mu_1$,…,$\mu_k$), and pick the one that minimizes this.
	
	Use the elbow method.
	
	Manually examine the clusterings, and pick the best one.
	
	Plot the data and the cluster centroids, and pick the clustering that gives the most "coherent" cluster centroids.
	%-----------------------------------------------%
\subsection{ Question 5. } 
	Which of the following statements are true? Select all that apply.
	
	WRONG Once an example has been assigned to a particular centroid, it will never be reassigned to another different centroid
	
	CORRECT On every iteration of K-means, the cost function J(c(1),…,c(m),$\mu_1$,…,$\mu_k$) (the distortion function) should either stay the same or decrease; in particular, it should not increase.
	
	WRONG K-Means will always give the same results regardless of the initialization of the centroids.
	
	CORRECT A good way to initialize K-means is to select K (distinct) examples from the training set and set the cluster centroids equal to these selected examples.
	Submit Quiz
	
	

\section{Principal Component Analysis}
%---------------%
\subsection{Recommended Applications of PCA}

\begin{itemize}
	\item Data Compression: reducing the dimensionof input data $x^{(i)}$, which will be used in a supervised learning algorithm
	(i.e. use PCA data so that your supervised learning algorithm runs faster.
	
	\item Data Visualization: Reduce data to 2D ( or 3D) so that it can be plotted.
\end{itemize}
Close

\subsection{ Question 1. }

Consider the following 2D dataset:


Which of the following figures correspond to possible values that PCA may return for u(1) (the first eigenvector / first principal component)? 
Check all that apply (you may have to check more than one figure).

(Select both parallel vectors - CORRECT)







%====================================================================%

\subsection{ Question 2. }
Which of the following is a reasonable way to select the number of principal components k?

(Recall that n is the dimensionality of the input data and m is the number of input examples.)

\begin{itemize}
	\item Choose the value of k that minimizes the approximation error % 1m∑mi=1||x(i)−x(i)approx||2.
	
	\item Choose k to be the smallest value so that at least 1\% of the variance is retained.
	
	\item Choose k to be 99\% of n (i.e., k=0.99 $\times$ n, rounded to the nearest integer).
	
	\item Choose k to be the smallest value so that at least 99\% of the variance is retained. [CORRECT - Selected]
	
\end{itemize}
%====================================================================%
\subsection{ Question 3. }
Suppose someone tells you that they ran PCA in such a way that "95\% of the variance was retained." What is an equivalent statement to this?

%1m∑mi=1||x(i)−x(i)approx||21m∑mi=1||x(i)||2≥0.95 [INCORRECT Selected]
%
%1m∑mi=1||x(i)−x(i)approx||21m∑mi=1||x(i)||2≤0.05
%
%1m∑mi=1||x(i)||21m∑mi=1||x(i)−x(i)approx||2≥0.95
%
%1m∑mi=1||x(i)−x(i)approx||21m∑mi=1||x(i)||2≥0.05

%====================================================================%
\subsection{ Question 4. }
Which of the following statements are true? Check all that apply.

\begin{itemize}
	\item Given an input x $\in$ $R^n$, PCA compresses it to a lower-dimensional vector z$\in$ $R^k$.
	
	\item PCA can be used only to reduce the dimensionality of data by 1 (such as 3D to 2D, or 2D to 1D).
	
	\item If the input features are on very different scales, it is a good idea to perform feature scaling before applying PCA.
	
	\item Feature scaling is not useful for PCA, since the eigenvector calculation (such as using Octave's svd(Sigma) routine) takes care of this automatically.
\end{itemize}
%====================================================================%
\subsection{ Question 5. }
Which of the following are recommended applications of PCA? Select all that apply.

\begin{itemize}
	\item [CORRECT] Data compression: Reduce the dimension of your data, so that it takes up less memory / disk space.
	
	\item [CORRECT] Data compression: Reduce the dimension of your input data x(i), which will be used in a supervised learning algorithm (i.e., use PCA so that your supervised learning algorithm runs faster).
	
	\item As a replacement for (or alternative to) linear regression: For most learning applications, PCA and linear regression give substantially similar results.
	
	\item Data visualization: To take 2D data, and find a different way of plotting it in 2D (using k=2).
	
\end{itemize}
\end{document

Given an input x∈Rn, PCA compresses it to a lower-dimensional vector z∈Rk.

If the input features are on very different scales, it is a good idea to perform feature scaling before applying PCA.




Which of the following are recommended applications of PCA? Select all that apply.

To get more features to feed into a learning algorithm.

SELECTED Data visualization: Reduce data to 2D (or 3D) so that it can be plotted.

SELECTED Data compression: Reduce the dimension of your input data x(i), which will be used in a supervised learning algorithm (i.e., use PCA so that your supervised learning algorithm runs faster).

Clustering: To automatically group examples into coherent groups.

}
